cefr level
CEFR-Annotated WordNet: LLM-Based Proficiency-Guided Semantic Database for Language Learning
Kikuchi, Masato, Ono, Masatsugu, Soga, Toshioki, Tanabe, Tetsu, Ozono, Tadachika
Although WordNet is a valuable resource owing to its structured semantic networks and extensive vocabulary, its fine-grained sense distinctions can be challenging for second-language learners. To address this, we developed a WordNet annotated with the Common European Framework of Reference for Languages (CEFR), integrating its semantic networks with language-proficiency levels. We automated this process using a large language model to measure the semantic similarity between sense definitions in WordNet and entries in the English Vocabulary Profile Online. To validate our method, we constructed a large-scale corpus containing both sense and CEFR-level information from our annotated WordNet and used it to develop contextual lexical classifiers. Our experiments demonstrate that models fine-tuned on our corpus perform comparably to those trained on gold-standard annotations. Furthermore, by combining our corpus with the gold-standard data, we developed a practical classifier that achieves a Macro-F1 score of 0.81, indicating the high accuracy of our annotations. Our annotated WordNet, corpus, and classifiers are publicly available to help bridge the gap between natural language processing and language education, thereby facilitating more effective and efficient language learning.
- North America > United States (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Japan > Hokkaidō (0.04)
Trans-EnV: A Framework for Evaluating the Linguistic Robustness of LLMs Against English Varieties
Lee, Jiyoung, Kim, Seungho, Han, Jieun, Lee, Jun-Min, Kim, Kitaek, Oh, Alice, Choi, Edward
Large Language Models (LLMs) are predominantly evaluated on Standard American English (SAE), often overlooking the diversity of global English varieties. This narrow focus may raise fairness concerns as degraded performance on non-standard varieties can lead to unequal benefits for users worldwide. Therefore, it is critical to extensively evaluate the linguistic robustness of LLMs on multiple non-standard English varieties. We introduce Trans-EnV, a framework that automatically transforms SAE datasets into multiple English varieties to evaluate the linguistic robustness. Our framework combines (1) linguistics expert knowledge to curate variety-specific features and transformation guidelines from linguistic literature and corpora, and (2) LLM-based transformations to ensure both linguistic validity and scalability. Using Trans-EnV, we transform six benchmark datasets into 38 English varieties and evaluate seven state-of-the-art LLMs. Our results reveal significant performance disparities, with accuracy decreasing by up to 46.3% on non-standard varieties. These findings highlight the importance of comprehensive linguistic robustness evaluation across diverse English varieties. Each construction of Trans-EnV was validated through rigorous statistical testing and consultation with a researcher in the field of second language acquisition, ensuring its linguistic validity. Our code and datasets are publicly available at https://github.com/jiyounglee-0523/TransEnV and https://huggingface.co/collections/jiyounglee0523/transenv-681eadb3c0c8cf363b363fb1.
- North America > Canada > Newfoundland and Labrador > Newfoundland (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- (7 more...)
- Research Report > New Finding (0.48)
- Research Report > Experimental Study (0.46)
An Effective Strategy for Modeling Score Ordinality and Non-uniform Intervals in Automated Speaking Assessment
Lo, Tien-Hong, Chen, Szu-Yu, Sung, Yao-Ting, Chen, Berlin
Abstract--A recent line of research on automated speaking assessment (ASA) has benefited from self-supervised learning (SSL) representations, which capture rich acoustic and linguistic patterns in non-native speech without underlying assumptions of feature curation. However, speech-based SSL models capture acoustic-related traits but overlook linguistic content, while text-based SSL models rely on ASR output and fail to encode prosodic nuances. Moreover, most prior arts treat proficiency levels as nominal classes, ignoring their ordinal structure and non-uniform intervals between proficiency labels. T o address these limitations, we propose an effective ASA approach combining SSL with handcrafted indicator features via a novel modeling paradigm. We further introduce a multi-margin ordinal loss that jointly models both the score ordinality and non-uniform intervals of proficiency labels. Extensive experiments on the TEEMI corpus show that our method consistently outperforms strong baselines and generalizes well to unseen prompts.
- Asia > Taiwan (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- Europe > Czechia (0.04)
UniversalCEFR: Enabling Open Multilingual Research on Language Proficiency Assessment
Imperial, Joseph Marvin, Barayan, Abdullah, Stodden, Regina, Wilkens, Rodrigo, Sanchez, Ricardo Munoz, Gao, Lingyun, Torgbi, Melissa, Knight, Dawn, Forey, Gail, Jablonkai, Reka R., Kochmar, Ekaterina, Reynolds, Robert, Ribeiro, Eugénio, Saggion, Horacio, Volodina, Elena, Vajjala, Sowmya, François, Thomas, Alva-Manchego, Fernando, Madabushi, Harish Tayyar
We introduce UniversalCEFR, a large-scale multilingual and multidimensional dataset of texts annotated with CEFR (Common European Framework of Reference) levels in 13 languages. To enable open research in automated readability and language proficiency assessment, UniversalCEFR comprises 505,807 CEFR-labeled texts curated from educational and learner-oriented resources, standardized into a unified data format to support consistent processing, analysis, and modelling across tasks and languages. To demonstrate its utility, we conduct benchmarking experiments using three modelling paradigms: a) linguistic feature-based classification, b) fine-tuning pre-trained LLMs, and c) descriptor-based prompting of instruction-tuned LLMs. Our results support using linguistic features and fine-tuning pretrained models in multilingual CEFR level assessment. Overall, UniversalCEFR aims to establish best practices in data distribution for language proficiency research by standardising dataset formats, and promoting their accessibility to the global research community.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- (33 more...)
- Information Technology > Security & Privacy (1.00)
- Education > Curriculum > Subject-Specific Education (1.00)
Towards Ontology-Based Descriptions of Conversations with Qualitatively-Defined Concepts
Gendron, Barbara, Guibon, Gaël, D'aquin, Mathieu
The controllability of Large Language Models (LLMs) when used as conversational agents is a key challenge, particularly to ensure predictable and user-personalized responses. This work proposes an ontology-based approach to formally define conversational features that are typically qualitative in nature. By leveraging a set of linguistic descriptors, we derive quantitative definitions for qualitatively-defined concepts, enabling their integration into an ontology for reasoning and consistency checking. We apply this framework to the task of proficiency-level control in conversations, using CEFR language proficiency levels as a case study. These definitions are then formalized in description logic and incorporated into an ontology, which guides controlled text generation of an LLM through fine-tuning. Experimental results demonstrate that our approach provides consistent and explainable proficiency-level definitions, improving transparency in conversational AI.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
- Europe > Middle East > Malta > Eastern Region > Northern Harbour District > St. Julian's (0.04)
- (5 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.90)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Alignment Drift in CEFR-prompted LLMs for Interactive Spanish Tutoring
Almasi, Mina, Kristensen-McLachlan, Ross Deans
This paper investigates the potentials of Large Language Models (LLMs) as adaptive tutors in the context of second-language learning. In particular, we evaluate whether system prompting can reliably constrain LLMs to generate only text appropriate to the student's competence level. We simulate full teacher-student dialogues in Spanish using instruction-tuned, open-source LLMs ranging in size from 7B to 12B parameters. Dialogues are generated by having an LLM alternate between tutor and student roles with separate chat histories. The output from the tutor model is then used to evaluate the effectiveness of CEFR-based prompting to control text difficulty across three proficiency levels (A1, B1, C1). Our findings suggest that while system prompting can be used to constrain model outputs, prompting alone is too brittle for sustained, long-term interactional contexts - a phenomenon we term alignment drift. Our results provide insights into the feasibility of LLMs for personalized, proficiency-aligned adaptive tutors and provide a scalable method for low-cost evaluation of model performance without human participants.
- North America > Mexico > Mexico City > Mexico City (0.04)
- South America > Uruguay > Maldonado > Maldonado (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- (13 more...)
- Education > Educational Setting (0.68)
- Education > Curriculum > Subject-Specific Education (0.49)
- Education > Educational Technology > Educational Software > Computer Based Training (0.47)
Exploiting the English Vocabulary Profile for L2 word-level vocabulary assessment with LLMs
Bannò, Stefano, Knill, Kate, Gales, Mark
Vocabulary use is a fundamental aspect of second language (L2) proficiency. To date, its assessment by automated systems has typically examined the context-independent, or part-of-speech (PoS) related use of words. This paper introduces a novel approach to enable fine-grained vocabulary evaluation exploiting the precise use of words within a sentence. The scheme combines large language models (LLMs) with the English Vocabulary Profile (EVP). The EVP is a standard lexical resource that enables in-context vocabulary use to be linked with proficiency level. We evaluate the ability of LLMs to assign proficiency levels to individual words as they appear in L2 learner writing, addressing key challenges such as polysemy, contextual variation, and multi-word expressions. We compare LLMs to a PoS-based baseline. LLMs appear to exploit additional semantic information that yields improved performance. We also explore correlations between word-level proficiency and essay-level proficiency. Finally, the approach is applied to examine the consistency of the EVP proficiency levels. Results show that LLMs are well-suited for the task of vocabulary assessment.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > Oklahoma > Payne County > Cushing (0.04)
- (8 more...)
ARWI: Arabic Write and Improve
Chirkunov, Kirill, Alhafni, Bashar, Qwaider, Chatrine, Habash, Nizar, Briscoe, Ted
Although Arabic is spoken by over 400 million people, advanced Arabic writing assistance tools remain limited. To address this gap, we present ARWI, a new writing assistant that helps learners improve essay writing in Modern Standard Arabic. ARWI is the first publicly available Arabic writing assistant to include a prompt database for different proficiency levels, an Arabic text editor, state-of-the-art grammatical error detection and correction, and automated essay scoring aligned with the Common European Framework of Reference standards for language attainment. Moreover, ARWI can be used to gather a growing auto-annotated corpus, facilitating further research on Arabic grammar correction and essay scoring, as well as profiling patterns of errors made by native speakers and non-native learners. A preliminary user study shows that ARWI provides actionable feedback, helping learners identify grammatical gaps, assess language proficiency, and guide improvement.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.05)
- North America > United States > Maryland > Baltimore (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- (8 more...)
- Education > Assessment & Standards > Student Performance (0.91)
- Education > Educational Technology > Educational Software (0.56)
Enhancing Arabic Automated Essay Scoring with Synthetic Data and Error Injection
Qwaider, Chatrine, Alhafni, Bashar, Chirkunov, Kirill, Habash, Nizar, Briscoe, Ted
Automated Essay Scoring (AES) plays a crucial role in assessing language learners' writing quality, reducing grading workload, and providing real-time feedback. Arabic AES systems are particularly challenged by the lack of annotated essay datasets. This paper presents a novel framework leveraging Large Language Models (LLMs) and Transformers to generate synthetic Arabic essay datasets for AES. We prompt an LLM to generate essays across CEFR proficiency levels and introduce controlled error injection using a fine-tuned Standard Arabic BERT model for error type prediction. Our approach produces realistic human-like essays, contributing a dataset of 3,040 annotated essays. Additionally, we develop a BERT-based auto-marking system for accurate and scalable Arabic essay evaluation. Experimental results demonstrate the effectiveness of our framework in improving Arabic AES performance.
- Asia > Thailand > Bangkok > Bangkok (0.04)
- North America > United States > New York (0.04)
- Europe > Ukraine > Kyiv Oblast > Kyiv (0.04)
- (4 more...)
- Overview (0.93)
- Research Report > New Finding (0.48)
- Personal > Interview (0.46)
Assessing the validity of new paradigmatic complexity measures as criterial features for proficiency in L2 writings in English
Mallart, Cyriel, Simpkin, Andrew, Ballier, Nicolas, Lissón, Paula, Venant, Rémi, Li, Jen-Yu, Stearns, Bernardo, Gaillat, Thomas
This article addresses Second Language (L2) writing development through an investigation of new grammatical and structural complexity metrics. We explore the paradigmatic production in learner English by linking language functions to specific grammatical paradigms. Using the EFCAMDAT as a gold standard and a corpus of French learners as an external test set, we employ a supervised learning framework to operationalise and evaluate seven microsystems. We show that learner levels are associated with the seven microsystems (MS). Using ordinal regression modelling for evaluation, the results show that all MS are significant but yield a low impact if taken individually. However, their influence is shown to be impactful if taken as a group. These microsystems and their measurement method suggest that it is possible to use them as part of broader-purpose CALL systems focused on proficiency assessment.
- Europe > Netherlands (0.28)
- Europe > Sweden (0.14)
- North America > United States > Hawaii (0.14)
- (4 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)